import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import least_squares
from valib import monteCarlo, bsCall, deltaHedge, monteCarloHedge
import odhlib as odh
from odhlib import MATURITY_BUCKETS_LABELS, maturity_category
plt.rcParams.update({
'text.usetex': True,
'font.family': 'serif',
'font.serif': ['Times New Roman'],
'figure.titlesize': 20,
'axes.labelsize': 12,
'axes.titlesize': 14,
'xtick.labelsize': 10,
'ytick.labelsize': 10,
'axes.grid': True,
'axes.axisbelow': True,
'grid.alpha': 0.5,
'axes.spines.right': False,
'axes.spines.top': False,
'figure.figsize': (10, 6),
'figure.dpi': 150,
})
pd.options.display.float_format = '{:,.3f}'.format
Volatility arbitrage¶
This section presents the execution and the results of numerical simulations aimed at validating the theoretical formulation for calculating the mark-to-market profit obtained from delta hedging an underpriced option, as detailed in my CQF Final Project.
Monte Carlo simulations¶
To improve the standard Monte Carlo method and reduce the variance of the results, several enhancements have been implemented.
- Euler-Maruyama scheme: a first order approximation of the Stochastic Differential Equation describing the geometric brownian motion followed by the asset price;
- Milstein scheme: a second order approximation of the SDE;
- Antithetic sampling: a variance reduction technique consisting in considering pairs of Gaussian variates that have negative correlation;
- Sobol' sequence: a low-discrepancy sequence enabling to run quasi-Monte Carlo simulations converging faster than $\mathcal{O}( N^{-\frac{1}{2}})$;
- Brownian Bridge: a procedure for constructing a discrete Wiener process by using the very first Gaussian variate in a vector draw to determine the realisation at the final time, defining the realizations at intermediate time steps by using the remaining variates.
Among all the variance reduction techniques implemented, the best results are obtained from combining Sobol' sequence with the Brownian Bridge. Three sets of Monte Carlo simulations are executed with different time step sizes, to show the effect of discrete hedging.
S0 = 100
mu = 0.1
actualVol = 0.3
daysToExpiry = 252
nPaths = 16384
daysPerYear = 252
nTimesteps = daysToExpiry
T = daysToExpiry / daysPerYear
dt = T / nTimesteps
simOpts_baseline = {'nTimesteps': nTimesteps, 'nPaths': nPaths, 'scheme': 'milstein', 'variates': 'sobol-bb', 'seed': 0}
simOpts_midFreq = {'nTimesteps': nTimesteps*8, 'nPaths': nPaths, 'scheme': 'milstein', 'variates': 'sobol-bb', 'seed': 0}
simOpts_highFreq = {'nTimesteps': nTimesteps*16, 'nPaths': nPaths, 'scheme': 'milstein', 'variates': 'sobol-bb', 'seed': 0}
# Generate the paths
sim1 = monteCarlo(initialPrice=S0, drift=mu, volatility=actualVol, simTime=T, **simOpts_baseline)
sim2 = monteCarlo(initialPrice=S0, drift=mu, volatility=actualVol, simTime=T, **simOpts_midFreq)
sim3 = monteCarlo(initialPrice=S0, drift=mu, volatility=actualVol, simTime=T, **simOpts_highFreq)
The figure below shows the first 100 realizations of the Monte Carlo simulation and the distribution of the asset prices at final time.
fig, axs = sim1.plotPaths()
fig.suptitle("Monte Carlo simulation with Sobol' sequence and Brownian Bridge")
plt.show()
Implied volatility hedging is applied to the paths from the first Monte Carlo simulation.
# Define call and hedge specifications
K = 100
r = 0.05
impVol = actualVol*0.6
callSpecs = {'strike': K, 'riskFreeRate': r, 'impVol': impVol,}
hedgeSpecs = {'hedgeVol': impVol}
impVol_MC = monteCarloHedge(sim1, callSpecs, hedgeSpecs)
100%|██████████| 16384/16384 [00:10<00:00, 1537.88it/s]
The evolution over time of the profits gained from the volatility arbitrage opportunity are plotted on the left side, together with the histogram representing the distribution of final profits.
fig, ax = impVol_MC.plotPnl()
fig.suptitle('Monte Carlo simulation of implied volatility volatility hedging')
plt.show()
Simulating with smaller time steps and increased hedging frequency proves that rebalancing more often while gamma assumes high values is crucial to lock more profits and to avoid that the time decay effect prevails.
hedgeSpecs = {'hedgeVol': impVol}
impVol_MC2 = monteCarloHedge(sim2, callSpecs, hedgeSpecs)
100%|██████████| 16384/16384 [01:02<00:00, 264.10it/s]
fig, ax = impVol_MC2.plotPnl()
fig.suptitle('Monte Carlo simulation of implied volatility volatility hedging applied 8 times a day')
plt.show()
The following Monte Carlo simulations of actual volatility hedging show that final profits are distributed around the theoretical expected value and that the deviation from it depends on the hedging frequency.
hedgeSpecs = {'hedgeVol': actualVol}
actVol_MC = monteCarloHedge(sim1, callSpecs, hedgeSpecs)
100%|██████████| 16384/16384 [00:12<00:00, 1282.11it/s]
fig,axs = actVol_MC.plotPnl()
fig.suptitle('Monte Carlo simulation of actual volatility hedging')
plt.show()
hedgeSpecs = {'hedgeVol': actualVol}
actVol_MC2 = monteCarloHedge(sim2, callSpecs, hedgeSpecs)
actVol_MC3 = monteCarloHedge(sim3, callSpecs, hedgeSpecs)
100%|██████████| 16384/16384 [01:06<00:00, 245.46it/s] 100%|██████████| 16384/16384 [02:16<00:00, 120.15it/s]
fig,axs = actVol_MC2.plotPnl()
fig.suptitle('Monte Carlo simulation of actual volatility hedging applied 8 times a day')
plt.show()
fig,axs = actVol_MC3.plotPnl()
fig.suptitle('Monte Carlo simulation of actual volatility hedging applied 16 times a day')
plt.show()
Single path simulations and decomposition of profits¶
The following simulations on single paths show the time evolution of the asset price, the values assumed by the Greeks and the cumulative mark-to-market profits.
# Selec an index identifying a realisation
idx = 109
path = sim1.getSinglePath(idx)
callSpecs = {'strike': K, 'riskFreeRate': r, 'impVol': impVol,}
hedgeSpecs = {'hedgeVol': impVol}
# Apply hedgin strategy to a single path
impVolHedge = deltaHedge(path, **callSpecs, **hedgeSpecs)
fig, axs = impVolHedge.plot()
plt.show()
path2 = sim2.getSinglePath(idx)
impHedgimpVolHedge2 = deltaHedge(path2, **callSpecs, **hedgeSpecs)
fig, axs = impHedgimpVolHedge2.plot(showShades=False)
hedgeSpecs = {'hedgeVol': actualVol}
actVolHedge = deltaHedge(path, **callSpecs, **hedgeSpecs)
actVolHedge.plot()
plt.show()
Minimum Variance Delta¶
This chapter is devoted to analyzing and replicating the approach followed by Hull and White to develop an empirical mathematical model for calculating a delta value that effectively minimizes the variance of changes in a hedged position. Market data for options on the S&P500 index, spanning from 2010 to 2023, are utilized for this purpose. The analysis is focused on European call options. The dataset is sourced from OptionsDX.com and consists of more than 15 millions of entries. It included end-of-day quotations for both call and put options, covering a range of strikes and maturities for each trading day. Additionally, it provides values for the Greeks and information regarding the last quotation of the underlying asset.
Loading and processing data¶
The initial phase of the analysis involves loading and processing the dataset, a task that is computationally intensive. To optimize efficiency and minimize the need for redundant computations, both the preprocessed and fully processed data are saved in the designated data folder.
The preprocessing phase includes several key operations:
- Exclusion of data related to put options.
- Grouping of quotations with identical expiry dates and strike prices by assigning them to a common CONTRACT identifier.
- Calculation of the call price as the average of the bid and ask quotations.
- Filling in any missing information on the underlying asset's price.
- Elimination of extraneous columns that are not required for the analysis.
Subsequent to preprocessing, the processing phase involves:
- Grouping quotations by contract and computing daily changes in both the price and implied volatility.
- Categorization of each quotation according to its maturity and assignment to specific moneyness categories.
- Removal of rows containing invalid or incomplete data.
start_date = '2010-01'
end_date = '2023-12'
data_path = 'data'
ticker = 'SPX'
try:
processedOptionChain = odh.loadProcessedData(ticker, data_path, startDate=start_date, endDate=end_date)
except FileNotFoundError:
df = odh.loadAndPreprocessData(ticker, data_path, startDate=start_date, endDate=end_date)
processedOptionChain = odh.processData(df)
odh.saveProcessedData(processedOptionChain, ticker, data_path)
Loading processed data from data/SPX/processed/spx_processed_201001_to_202312.csv
/Users/fabio/Desktop/CQF/DH-Optimal-Hedging/odhlib.py:339: FutureWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument. data = pd.read_csv(filePath, parse_dates=['QUOTE_DATE', 'EXPIRE_DATE'], infer_datetime_format=True, dtype=dtypes_dict)
Processed data loaded successfully The dataframe contains 2726284 rows
Data analysis¶
A first analysis exploratory is executed on a smaller portion of the dataset.
analysisWindowSize = 12
analysisDate = '2018-12'
preliminaryAnalysis_df = odh.getTrainingWindow(processedOptionChain, analysisDate, analysisWindowSize)
Change in option price is here plotted against change in underlying price and delta. Most of the data lies within the horizontal axis and the line with slope 1.
fig,ax = plt.subplots(1,1,)
sns.scatterplot(x='UNDERLYING_CHANGE', y='PRICE_CHANGE', data=preliminaryAnalysis_df, hue='DELTA', alpha=0.5, palette='coolwarm', ax=ax)
ax.set_xlabel(r'$\Delta S$')
ax.set_ylabel(r'$\Delta V$')
ax.set_title('Change in option price vs change in underlying price')
x = np.linspace(preliminaryAnalysis_df['UNDERLYING_CHANGE'].min(), preliminaryAnalysis_df['UNDERLYING_CHANGE'].max(), 2)
ax.plot(x, x, color='black', linestyle='--')
plt.show()
This plot shows the option price change as a function of delta. The colormap represents the rate of return of the underlying asset.
returnRate = preliminaryAnalysis_df['UNDERLYING_CHANGE']/preliminaryAnalysis_df['UNDERLYING_PRICE']
sns.scatterplot(data=preliminaryAnalysis_df, x='DELTA', y='PRICE_CHANGE',hue=returnRate, )
plt.figure('Distribution of change in option price with respect to delta')
plt.show()
<Figure size 1500x900 with 0 Axes>
The plot on the left side displays vega in function of delta. On the right, vega is scaled with respect to the asset price and the square root of the time to maturity.
fig, ax = plt.subplots(1, 2)
sns.scatterplot(data=preliminaryAnalysis_df, x='DELTA', y='VEGA', hue='MATURITY_BUCKET', ax=ax[0], legend=False)
sns.scatterplot(data=preliminaryAnalysis_df, x='DELTA', y=preliminaryAnalysis_df['VEGA']/(preliminaryAnalysis_df['UNDERLYING_PRICE']*np.sqrt(preliminaryAnalysis_df['T'])), hue='MATURITY_BUCKET', ax=ax[1])
ax[0].set_title('Vega vs Delta')
ax[0].set_xlabel(r'$\Delta$')
ax[0].set_ylabel(r'$\nu$')
ax[1].set_title('Normalized Vega vs Delta')
ax[1].set_xlabel(r'$\Delta$')
ax[1].set_ylabel(r'$\frac{\nu}{S \sqrt{T}}$')
plt.show()
Data cleaning¶
It is necessary to perform this analysis on the whole dataset, because a dirty dataset will influence the quality of the mathematical model.
As an example, the following scatterplots show the relationships between different features of the dataset. An unusual pattern is a signal that further cleaning is needed.
For example, the last portion of the data exhibits unusually large and negative values for Vega, which is supposed to be a positive quantity. Those entries then need to be discarded.
Linear estimation of Minimum Variance Delta¶
The Minimum Variance Delta is here estimated with least squares method aiming to minimize the variance of the hedging error $\varepsilon$ in $\delta V = \Delta_{MV} \delta S + \varepsilon$. The estimation is performed on December 2018 using data from the preceding 36 months.
linearRegressionWindowSize = 36
linearRegressionDate = '2018-12'
preliminaryRegression_df = odh.getTrainingWindow(processedOptionChain, linearRegressionDate, analysisWindowSize)
The estimation is performed separately for each moneyness and maturity bucket.
# Create a list of buckets
deltaBucketList = preliminaryRegression_df['DELTA_BUCKET'].unique()
maturityBucketList = preliminaryRegression_df['MATURITY_BUCKET'].unique()
deltaBucketList.sort()
# Create dataframe for storing the regression results
deltaMV_linear = pd.DataFrame(columns=maturityBucketList, index=deltaBucketList)
# Iterate over each bucket combination
for dteBucket in maturityBucketList:
for deltaBucket in deltaBucketList:
filter = (preliminaryRegression_df['DELTA_BUCKET'] == deltaBucket) & (preliminaryRegression_df['MATURITY_BUCKET'] == dteBucket)
analysis_df_filtered = preliminaryRegression_df[filter]
X = analysis_df_filtered['UNDERLYING_CHANGE'].values
y = analysis_df_filtered['PRICE_CHANGE'].values
def residuals(d, x, y):
return y - d*x
d0 = 0.0
result = least_squares(residuals, d0, args=(X, y))
deltaMV_linear.loc[deltaBucket, dteBucket] = result.x[0]
deltaMV_linear
| 1M | 3M | 6M | 12M | 12M+ | |
|---|---|---|---|---|---|
| 0.100 | 0.064 | 0.058 | 0.065 | 0.074 | 0.083 |
| 0.200 | 0.149 | 0.146 | 0.161 | 0.177 | 0.195 |
| 0.300 | 0.236 | 0.238 | 0.263 | 0.273 | 0.310 |
| 0.400 | 0.323 | 0.343 | 0.370 | 0.375 | 0.423 |
| 0.500 | 0.429 | 0.440 | 0.468 | 0.474 | 0.519 |
| 0.600 | 0.529 | 0.553 | 0.571 | 0.582 | 0.630 |
| 0.700 | 0.622 | 0.656 | 0.693 | 0.669 | 0.737 |
| 0.800 | 0.772 | 0.775 | 0.788 | 0.800 | 0.843 |
| 0.900 | 0.914 | 0.913 | 0.931 | 0.920 | 0.921 |
In the plot below, the dashed line represents the condition $\Delta_{MV} = \Delta_{BS}$. For the period under examination, Minimum Variance Delta is always lower than Black-Scholes delta.
deltaMV_linear.plot()
plt.plot([0, 1], [0, 1], color='black', linestyle='--', linewidth=1)
plt.xlabel('Delta bucket')
plt.ylabel('Minimum variance delta')
plt.title('Minimum variance delta for each delta bucket and maturity')
plt.legend()
plt.show()
Next plot shows the difference between the two values as a function of $\Delta_{BS}$. In Hull and White findings, this relationship is close to being quadratic and not depending heavily on maturity. This does not apparently apply to what is shown in this plot, hence the choice to adopt separate models for separate maturity buckets. Another interesting field to investigate could be adopting a cubic model.
deltaDifference = deltaMV_linear - deltaMV_linear.index.values.reshape(-1, 1)
deltaDifference.plot()
plt.xlabel('Delta bucket')
plt.ylabel(r'$\Delta_{MV} - \Delta_{BS}$')
plt.title('Difference between MV delta and delta bucket midpoint')
plt.show()
Model fitting¶
The calibration of the model on the historical data is performed using a sliding window approach, by fitting each curve to the data from the 36 preceding months. The results are stored in a dataframe that has as many rows as the evaluation dates, and multi-indexed columns, for grouping the three coefficients by maturity bucket.
movingWindowSize = 36
byMaturity = True
modelHistory = odh.rollingFit(processedOptionChain, windowSize=movingWindowSize, byMaturity=byMaturity)
modelHistory.head()
Fitting model: 100%|██████████| 132/132 [06:37<00:00, 3.01s/it, Evaluation date=2023-12]
Model coefficients fitted successfully for the time window from 2013-01 to 2023-12
| 1M | 3M | 6M | 12M | 12M+ | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| a | b | c | a | b | c | a | b | c | a | b | c | a | b | c | |
| Evaluation date | |||||||||||||||
| 2013-01-01 | -19.869 | 59.224 | -71.282 | -28.401 | 64.569 | -78.487 | -32.527 | 71.521 | -87.561 | -34.077 | 69.592 | -89.423 | -34.898 | 63.306 | -86.496 |
| 2013-02-01 | -19.931 | 60.069 | -71.433 | -28.284 | 64.669 | -78.670 | -32.290 | 70.534 | -86.283 | -34.132 | 69.995 | -89.646 | -34.719 | 62.344 | -85.432 |
| 2013-03-01 | -20.941 | 62.327 | -73.394 | -28.190 | 63.006 | -77.206 | -32.357 | 70.716 | -86.934 | -34.166 | 69.650 | -89.114 | -34.691 | 61.906 | -84.811 |
| 2013-04-01 | -20.958 | 62.050 | -72.876 | -28.209 | 63.270 | -77.729 | -32.470 | 71.316 | -87.814 | -34.221 | 69.859 | -89.513 | -34.641 | 61.447 | -84.532 |
| 2013-05-01 | -21.091 | 56.093 | -65.015 | -28.417 | 65.184 | -80.466 | -32.673 | 72.394 | -89.230 | -34.372 | 71.023 | -91.272 | -34.488 | 61.864 | -86.059 |
The evolution over time of the coefficients for the 6M bucket is depicted below. The different scale of the coefficients with respect to the results obrained by Hull and White is probably due to some normalization that they introduced in their model.
def improve_legend(ax=None):
if ax is None:
ax = plt.gca()
for line in ax.lines:
data_x, data_y = line.get_data()
right_most_x = data_x[-1]
right_most_y = data_y[-1]
ax.annotate(
line.get_label(),
xy=(right_most_x, right_most_y),
xytext=(5, 0),
textcoords="offset points",
va="center",
color=line.get_color(),
usetex=True,
)
ax.legend().set_visible(False)
maturityBucket = '6M' if byMaturity else 'All'
fig, ax = plt.subplots()
plot_style = {'marker': 'o', 'markersize': 3, 'linewidth': 0.75}
modelHistory.loc[:, (maturityBucket, 'a')].plot(ax=ax, label='a', **plot_style)
modelHistory.loc[:, (maturityBucket, 'b')].apply(lambda x: -x).plot(ax=ax, label='-b', **plot_style)
modelHistory.loc[:, (maturityBucket, 'c')].plot(ax=ax, label='c', **plot_style)
ax.set_xlabel('Date')
ax.set_ylabel('Coefficient')
ax.set_title(f'Historical evolution of the coefficients for the {maturityBucket} bucket')
improve_legend(ax)
plt.show()
Next plot summarizes the time evolution of the coefficient for all maturities.
if byMaturity:
coeff = 'c'
fig, ax = plt.subplots()
for coeff, color in zip(['a', 'b', 'c'], ['tab:blue', 'tab:orange', 'tab:green']):
for maturity, alpha in zip(MATURITY_BUCKETS_LABELS, np.linspace(1, 0.3, len(MATURITY_BUCKETS_LABELS))):
modelHistory.loc[:, (maturity, coeff)].apply(lambda x: -x if coeff=='b' else x).plot(ax=ax, label=f"{'-' if coeff=='b' else ''}{coeff} {maturity}", alpha=alpha, color=color, linewidth=0.75)
ax.set_xlabel('Date')
ax.set_ylabel('Coefficient')
ax.set_title(f'Historical evolution of the coefficients')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left',)
plt.show()
pitValidationDate = linearRegressionDate
pitModel = modelHistory.loc[pitValidationDate].stack(level=0).droplevel(0)
pitModel.index = pitModel.index.astype(maturity_category)
pitModel.sort_index(inplace=True)
pitModel
| a | b | c | |
|---|---|---|---|
| 1M | -13.549 | -17.543 | 20.856 |
| 3M | -19.666 | 16.416 | -6.253 |
| 6M | -15.801 | 11.967 | 7.906 |
| 12M | -9.525 | -0.787 | 13.220 |
| 12M+ | -6.070 | 20.658 | 5.048 |
Then the Minimum Variance delta and the hedging errors for both strategies are evaluated on the 36-months window - to assess the quality of the fit - and on the out of the sample data. In this particular case the model performs best on the validation data.
trainingData, validationData = odh.getValidationData(processedOptionChain, pitModel, pitValidationDate, windowSize=movingWindowSize, byMaturity=byMaturity)
pitTrainingMetrics, pitValidationMetrics = odh.pitValidation(trainingData, validationData)
print(f"Training gain: {pitTrainingMetrics['overallGain']:.4%}")
print(f"Validation gain: {pitValidationMetrics['overallGain']:.4%}")
Training gain: 17.0981% Validation gain: 34.5677%
The following heatmaps effectively show the gains for each combination of buckets.
from matplotlib.colors import TwoSlopeNorm
from matplotlib.ticker import PercentFormatter
plt.rcParams.update({'text.usetex': False})
# Calculate global absolute max
vmax = max(abs(pitTrainingMetrics['gainDf'].max().max()), abs(pitTrainingMetrics['gainDf'].min().min()), abs(pitValidationMetrics['gainDf'].max().max()), abs(pitValidationMetrics['gainDf'].min().min()))
vmin = -vmax
# Create centered colormap
norm = TwoSlopeNorm(vmin=vmin, vcenter=0, vmax=vmax)
# Format the colorbar
formatter = PercentFormatter(xmax=1, decimals=1, symbol='%',)
fig = plt.figure()
rows, cols = pitTrainingMetrics['gainDf'].shape
grid = plt.GridSpec(nrows=(rows+1), ncols=2*(cols+1)+1, hspace=0.2, wspace=0.2)
main_ax_1 = fig.add_subplot(grid[:-1, :cols])
delta_ax_1 = fig.add_subplot(grid[:-1, cols], sharey=main_ax_1)
maturity_ax_1 = fig.add_subplot(grid[-1, :cols], sharex=main_ax_1)
overall_ax_1 = fig.add_subplot(grid[-1, cols], sharex=delta_ax_1, sharey=maturity_ax_1)
main_ax_2 = fig.add_subplot(grid[:-1, -(1+cols):-1])
delta_ax_2 = fig.add_subplot(grid[:-1, -1], sharey=main_ax_2)
maturity_ax_2 = fig.add_subplot(grid[-1, -(1+cols):-1], sharex=main_ax_2)
overall_ax_2 = fig.add_subplot(grid[-1, -1], sharex=delta_ax_2, sharey=maturity_ax_2)
cbar_ax = fig.add_axes([0.92, 0.15, 0.02, 0.7]) # Define the position of the colorbar
sns.heatmap(pitTrainingMetrics['gainDf'], annot=True, fmt='.1%', cmap='coolwarm', ax=main_ax_1, cbar_ax=cbar_ax, norm=norm,)
sns.heatmap(pitTrainingMetrics['gainByDelta'].to_frame(), annot=True, fmt='.1%', cmap='coolwarm', ax=delta_ax_1, cbar=False, norm=norm)
sns.heatmap(pitTrainingMetrics['gainByMaturity'].to_frame().T, annot=True, fmt='.1%', cmap='coolwarm', ax=maturity_ax_1, cbar=False, norm=norm)
sns.heatmap(pd.DataFrame(pitTrainingMetrics['overallGain'], index =['All'], columns=['All']), annot=True, fmt='.1%', cmap='coolwarm', ax=overall_ax_1, cbar=False, norm=norm)
sns.heatmap(pitValidationMetrics['gainDf'], annot=True, fmt='.1%', cmap='coolwarm', ax=main_ax_2, cbar_ax=cbar_ax, norm=norm,)
sns.heatmap(pitValidationMetrics['gainByDelta'].to_frame(), annot=True, fmt='.1%', cmap='coolwarm', ax=delta_ax_2, cbar=False, norm=norm)
sns.heatmap(pitValidationMetrics['gainByMaturity'].to_frame().T, annot=True, fmt='.1%', cmap='coolwarm', ax=maturity_ax_2, cbar=False, norm=norm)
sns.heatmap(pd.DataFrame(pitValidationMetrics['overallGain'], index =['All'], columns=['All']), annot=True, fmt='.1%', cmap='coolwarm', ax=overall_ax_2, cbar=False, norm=norm)
# Set tick visibility to True for desired subplots
main_ax_1.tick_params(axis='both', which='both', bottom=False, labelbottom=False, labelrotation=0)
delta_ax_1.tick_params(axis='both', which='both', bottom=False, left=False, labelleft=False, labelbottom=False)
maturity_ax_1.tick_params(axis='both', which='both', labelrotation=0)
overall_ax_1.tick_params(axis='both', which='both', left=False, labelleft=False)
# Remove labels and ticks from all subplots
for ax in [main_ax_1, delta_ax_1, maturity_ax_1, overall_ax_1]:
ax.set_xlabel(r'Maturity' if ax == maturity_ax_1 else '')
ax.set_ylabel(r'Delta' if ax == main_ax_1 else '')
# Set tick visibility to True for desired subplots
main_ax_2.tick_params(axis='both', which='both', bottom=False, labelbottom=False, labelrotation=0)
delta_ax_2.tick_params(axis='both', which='both', bottom=False, left=False, labelleft=False, labelbottom=False)
maturity_ax_2.tick_params(axis='both', which='both', labelrotation=0)
overall_ax_2.tick_params(axis='both', which='both', left=False, labelleft=False)
# Remove labels and ticks from all subplots
for ax in [main_ax_2, delta_ax_2, maturity_ax_2, overall_ax_2]:
ax.set_xlabel(r'Maturity' if ax == maturity_ax_2 else '')
ax.set_ylabel(r'Delta' if ax == main_ax_2 else '')
main_ax_1.set_title('Training set')
main_ax_2.set_title('Validation set')
cbar_ax.yaxis.set_major_formatter(formatter)
fig.suptitle(f'Gains evaluated on {pitValidationDate}')
plt.show()
plt.rcParams.update({'text.usetex': True})
This figure shows the relationship between change in option price, Black-Scholes delta and daily returns of the underlying (colormap). In red, the fitted model is evaluated on data relative to a quote sampled from the validation dataset.
returnRate = validationData['UNDERLYING_CHANGE']/validationData['UNDERLYING_PRICE']
sns.scatterplot(data=validationData, x='DELTA', y='PRICE_CHANGE',hue=returnRate, )
sampleQuote = validationData.iloc[883]
x = np.linspace(0,1,100)
y = sampleQuote['UNDERLYING_CHANGE']*x + sampleQuote['VEGA']*sampleQuote['UNDERLYING_CHANGE']/(np.sqrt(sampleQuote['T'])*sampleQuote['UNDERLYING_PRICE'])*np.polyval(pitModel.loc['6M'][::-1], x)
plt.plot(x, y, color='tab:red', linewidth=2)
print(f"{sampleQuote['UNDERLYING_CHANGE']/sampleQuote['UNDERLYING_PRICE'] = }")
plt.xlabel(r'$\Delta_{BS}$')
plt.ylabel(r'$\delta V$')
plt.figure('Distribution of change in option price with respect to delta')
plt.show()
sampleQuote['UNDERLYING_CHANGE']/sampleQuote['UNDERLYING_PRICE'] = -0.023020151909255337
<Figure size 1500x900 with 0 Axes>
Moving window validation¶
The validation of the calibrated model is run on the entire dataset using the same 36-month-wide moving window.
rawTrainingMetrics, rawValidationMetrics = odh.getRawLongitudinalMetrics(processedOptionChain, modelHistory, windowSize=movingWindowSize, byMaturity=byMaturity)
overallGain_history = odh.aggregateLongitudinalMetrics(processedOptionChain, rawTrainingMetrics, rawValidationMetrics)
Rolling validation of the model: 100%|██████████| 132/132 [00:47<00:00, 2.77it/s, Evaluation date=2023-12]
Model validated for the time window from 2013-01 to 2023-12
The evolution of training and validation gains over time is shown next. The average gain assumes quite often values lower than 0, indicating a general worsening in performance with respect to the Black-Scholes delta.
overallGain_history.plot(**plot_style)
improve_legend()
plt.ylabel('Gain')
plt.title(f'Historical evolution of the validation gains')
plt.show()
The time evolution of training gains shown below was helpful to detect some anomalies in the data. Not all the problems deriving from poor quality of the data source have been addressed, though.
# Plot the 0.1 delta data
fig, ax = plt.subplots(figsize=(10, 6))
rawTrainingMetrics['gains_history'].loc[:, ('6M', slice(None))].plot(ax=ax, **plot_style)
improve_legend(ax)
plt.ylabel('Gain')
plt.title(f'Historical evolution of the training gains for the 6M maturity bucket')
plt.show()
This last heatmap shows the overall gains for the whole dataset, divided by maturity and moneyness buckets.
globalValidationResults = odh.globalValidation(rawValidationMetrics)
plt.rcParams.update({'text.usetex': False})
# Calculate global absolute max
vmax = max(abs(globalValidationResults['granularGains'].max().max()), abs(globalValidationResults['granularGains'].min().min()))
vmin = -vmax
# Create centered colormap
norm = TwoSlopeNorm(vmin=vmin, vcenter=0, vmax=vmax)
# Format the colorbar
formatter = PercentFormatter(xmax=1, decimals=1, symbol='%',)
fig = plt.figure()
rows, cols = pitTrainingMetrics['gainDf'].shape
grid = plt.GridSpec(nrows=(rows+1), ncols=(cols+1), hspace=0.2, wspace=0.2)
main_ax_1 = fig.add_subplot(grid[:-1, :cols])
delta_ax_1 = fig.add_subplot(grid[:-1, cols], sharey=main_ax_1)
maturity_ax_1 = fig.add_subplot(grid[-1, :cols], sharex=main_ax_1)
overall_ax_1 = fig.add_subplot(grid[-1, cols], sharex=delta_ax_1, sharey=maturity_ax_1)
cbar_ax = fig.add_axes([0.92, 0.15, 0.02, 0.7]) # Define the position of the colorbar
sns.heatmap(globalValidationResults['granularGains'], annot=True, fmt='.1%', cmap='coolwarm', ax=main_ax_1, cbar_ax=cbar_ax, norm=norm,)
sns.heatmap(globalValidationResults['deltaGains'].to_frame(), annot=True, fmt='.1%', cmap='coolwarm', ax=delta_ax_1, cbar=False, norm=norm)
sns.heatmap(globalValidationResults['maturityGains'].to_frame().T, annot=True, fmt='.1%', cmap='coolwarm', ax=maturity_ax_1, cbar=False, norm=norm)
sns.heatmap(pd.DataFrame(globalValidationResults['overallGain'], index =['All'], columns=['All']), annot=True, fmt='.1%', cmap='coolwarm', ax=overall_ax_1, cbar=False, norm=norm)
# Set tick visibility to True for desired subplots
main_ax_1.tick_params(axis='both', which='both', bottom=False, labelbottom=False, labelrotation=0)
delta_ax_1.tick_params(axis='both', which='both', bottom=False, left=False, labelleft=False, labelbottom=False)
maturity_ax_1.tick_params(axis='both', which='both', labelrotation=0)
overall_ax_1.tick_params(axis='both', which='both', left=False, labelleft=False)
# Remove labels and ticks from all subplots
for ax in [main_ax_1, delta_ax_1, maturity_ax_1, overall_ax_1]:
ax.set_xlabel(r'Maturity' if ax == maturity_ax_1 else '')
ax.set_ylabel(r'Delta' if ax == main_ax_1 else '')
cbar_ax.yaxis.set_major_formatter(formatter)
fig.suptitle('Validation gains evaluated on the whole dataset')
plt.show()
plt.rcParams.update({'text.usetex': False})
Hedging performance¶
The following two plots show the relationship between Black-Scholes delta and MV delta, based on the model evaluated in December 2018.
fig, axes = plt.subplots(1, 2, sharex=True, sharey=True)
sns.scatterplot(data=trainingData, x='DELTA', y='DELTA_MV', hue='MATURITY_BUCKET', ax=axes[0], legend=False)
axes[0].plot([0, 1], [0, 1], color='black', linestyle='--')
axes[0].set_title('Training set')
sns.scatterplot(data=validationData, x='DELTA', y='DELTA_MV', hue='MATURITY_BUCKET', ax=axes[1])
axes[1].plot([0, 1], [0, 1], color='black', linestyle='--')
axes[1].set_title('Validation set')
axes[0].set_xlabel(r'$\Delta_{BS}$')
axes[0].set_ylabel(r'$\Delta_{MV}$')
axes[1].set_xlabel(r'$\Delta_{BS}$')
fig.suptitle('Minimum variance delta as a function of the Black-Scholes delta')
plt.show()
fig, axes = plt.subplots(1, 2, sharex=True, sharey=True)
sns.scatterplot(data=trainingData, x='DELTA', y='DELTA_DIFF', hue='MATURITY_BUCKET', ax=axes[0], legend=False)
axes[0].set_title('Training data')
sns.scatterplot(data=validationData, x='DELTA', y='DELTA_DIFF', hue='MATURITY_BUCKET', ax=axes[1])
axes[1].set_title('Validation data')
axes[0].set_xlabel(r'$\Delta_{BS}$')
axes[0].set_ylabel(r'$\Delta_{MV} - \Delta_{BS}$')
axes[1].set_xlabel(r'$\Delta_{BS}$')
fig.suptitle('Difference between minimum variance delta and Black-Scholes delta')
plt.show()
Here the difference $\Delta_{MV} - \Delta{BS}$ derived on the model evaluated in December 2018 is plotted against the $\Delta_{MV}$ values estimated from $\delta V = \Delta_{MV} \delta S + \varepsilon$.
deltaDifference.plot(legend=False)
sns.scatterplot(data=validationData, x='DELTA', y='DELTA_DIFF', hue='MATURITY_BUCKET', alpha=0.2)
plt.xlabel(r'$\Delta_{BS}$')
plt.ylabel(r'$\Delta_{MV} - \Delta_{BS}$')
plt.title('Difference between MV delta and BS delta')
plt.show()
This figure illustrates the distribution of hedging errors for different values of delta and shows that variance is effectivly reduced for out-of-the-money call options when hedging with minimum variance delta.
hedgingErrors = pd.melt(validationData, id_vars=['DELTA_BUCKET'], value_vars=['BS_ERROR', 'MV_ERROR'], var_name='Hedging_strategy', value_name='Hedging_error')
# Scatter plot of hedging errors vs delta
fig, axes = plt.subplots(1, 2, sharey=True)
sns.scatterplot(data=validationData, x='DELTA', y='BS_ERROR', alpha=0.4, ax=axes[0])
sns.scatterplot(data=validationData, x='DELTA', y='MV_ERROR', alpha=0.2, ax=axes[0])
sns.boxplot(data=hedgingErrors, x='DELTA_BUCKET', y='Hedging_error', hue='Hedging_strategy', ax=axes[1])
axes[0].set_xlabel(r'$\Delta_{BS}$')
axes[0].set_ylabel('Hedging error')
axes[1].set_xlabel(r'$\Delta_{BS}$')
axes[1].set_ylabel('')
fig.suptitle('Distribution of hedging errors as a function of BS delta')
plt.show()